REHIST: Relative Error Histogram Construction Algorithms
نویسندگان
چکیده
Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, such as V-Optimal, optimize absolute error measures for which the error in estimating a true value of 10 by 20 has the same effect of estimating a true value of 1000 by 1010. However, several researchers have recently pointed out the drawbacks of such schemes and proposed wavelet based schemes to minimize relative error measures. None of these schemes provide satisfactory guarantees – and we provide evidence that the difficulty may lie in the choice of wavelets as the representation scheme. In this paper, we consider histogram construction for the known relative error measures. We develop optimal as well as fast approximation algorithms. We provide a comprehensive theoretical analysis and demonstrate the effectiveness of these algorithms in providing significantly more accurate answers through synthetic and real life data sets.
منابع مشابه
Optimality and Scalability in Lattice Histogram Construction
The Lattice Histogram is a recently proposed data summarization technique that achieves approximation quality preferable to that of an optimal plain histogram. Like other hierarchical synopsis methods, a lattice histogram (LH) aims to approximate data using a hierarchical structure. Still, this structure is not defined a priori; it consists an unknown, not a given, of the problem. Past work has...
متن کاملOn-line Parametric Histogram Equ for Noise Robust Embedded Sp
In this paper, two low-complexity histogram equalization algorithms are presented that significantly reduce the mismatch between training and testing conditions in HMM-based automatic speech recognizers. The proposed algorithms use Gaussian approximations for the initial and target distributions and perform a linear mapping between them. We show that even this simplified mapping can improve the...
متن کاملLocal Search in Histogram Construction
The problem of dividing a sequence of values into segments occurs in database systems, information retrieval, and knowledge management. The challenge is to select a finite number of boundaries for the segments so as to optimize an objective error function defined over those segments. Although this optimization problem can be solved in polynomial time, the algorithm which achieves the minimum er...
متن کاملDigitHist: a Histogram-Based Data Summary with Tight Error Bounds
We propose DigitHist, a histogram summary for selectivity estimation on multi-dimensional data with tight error bounds. By combining multi-dimensional and one-dimensional histograms along regular grids of different resolutions, DigitHist provides an accurate and reliable histogram approach for multi-dimensional data. To achieve a compact summary, we use a sparse representation combined with a n...
متن کاملError minimization in approximate range aggregates
Histogram techniques have been used in many commercial database management systems to estimate a query result size. Recently, it has been shown that they are very effective to support approximation of query processing especially aggregates. In this paper, we investigate the problem of minimizing average errors of approximate aggregates using histogram techniques. Firstly, we present a novel lin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004